In a fast-moving digital landscape where readers skim headlines and scroll through feeds, only certain news articles manage to break through the noise and capture attention. These news articles grow popular not just from getting read but by getting shared, liked, and circulated online. The number of shares an article receives can determine how far it travels and how much influence it holds. For readers, this means the stories that appear at the top of their feed often guide what they see and think about. For editors and writers, it means understanding what drives virality is essential to keeping journalism relevant in an online environment. This project explores how features like article length, multimedia, sentiment, and topic relate to article shares, offering insights into the types of pieces that become popular.

I became interested in the analytics of digital journalism from my experiences working on The Cavalier Daily, UVA’s independent student newspaper. As a writer and editor, I saw how The Cavalier Daily’s social media presence influenced its views and audience engagement, as well as how popular articles shaped opinions and daily conversations across UVA. I grew curious about which features of an article contribute to the article’s circulation. For example, I wondered whether an article’s structure (e.g., length) affected its popularity more than its content (e.g., topic).

In this project, I use a range of data visualizations to explore how structural- and content-based factors relate to article popularity, measured by the number of shares an article gets. The quantitative variables in this analysis include the number of shares, title sentiment, content sentiment, polarity, subjectivity, and number of keywords. The qualitative variables include the day of publication, article topic, presence of videos, presence of images, and keyword type. I initially use simple data visualizations to explore three main features – time of publication, article length, and article topic – then expand this exploration to other features, such as article subjectivity and use of keywords. This project uses metadata of 500 online articles from Mashable, a digital media platform, collected over the span of two years (Fernandes et al., 2015).

How many times do articles usually get shared?

Before exploring how specific article characteristics relate to popularity, it is important to understand the overall distribution of article shares in the dataset. Examining the density of share counts reveals the underlying structure of engagement on the platform, such as whether most articles receive similar levels of attention or whether popularity is dominated by a small number of viral outliers.

The distribution of article shares in the dataset is right-skewed, with a concentration of articles receiving relatively modest engagement and a tail extending toward medium-to-high share values. Most articles cluster around the lower end of the scale (roughly within the first 1,000 shares), indicating that a typical article draws only a limited audience. This view helps contextualize all subsequent analyses.

Does it matter which day an article is published?

The scatter plot shows the relationship between publication day and article shares, allowing users to explore whether publication timing may influence article popularity. By depicting the spread of articles published on each day, the scatterplot makes it clear whether certain days have highly variable performance versus more consistent engagement. The average number of shares for each day is depicted by the yellow points, and the value of the highest average is annotated and marked by a horizontal line. These markers combine individual-level detail with summary statistics, encouraging users to explore within-day distributions and make comparisons across days of the week.

The scatter plot reveals that the spread of article shares is fairly similar across all seven days, with Tuesday showing the most spread. The higher concentration of points on Tuesday, Wednesday, and Thursday suggests more articles were published on weekdays. The yellow points show that articles published on Saturday have a higher average share count than other days of the week. While this difference is modest, it suggests that publication day may a relevant feature contributing to article popularity.

Are shorter articles better?

Article titles serve as the first point of contact for readers and often determine whether an article is clicked or shared. Understanding how title length relates to share count can reveal whether readers prefer concise headlines or more descriptive ones, and whether length plays a measurable role in article popularity. Additionally, longer articles may provide more depth, while shorter ones may hold attention better. Analyzing the relationship between content length and shares helps determine whether readers tend to share articles that are brief and digestible or more comprehensive in scope.

This shiny app (https://miartan.shinyapps.io/stat3280app/) explores how the length of article title and content relate to shares. The scatterplot shows no strong linear relationship between title length and shares; articles with both short and long titles appear across the full range of share counts. The LOESS curve suggests a subtle increase in shares for moderately long titles, but this effect is weak and overshadowed by the high variance. This implies that title content likely matters more than length alone. Length may help shape clarity or tone, but it does not consistently predict article popularity on its own.

The scatterplot also reveals wide variation in shares regardless of article length. The LOESS smoothing line slightly increases for mid-range content lengths, suggesting that extremely short or extremely long articles may be less likely to go viral. However, the overall weak relationship indicates that article length by itself is not a primary driver of popularity. Content richness, clarity, and topical relevance are likely more important than raw word count.

How do publication day and article topic relate to shares?

Publication timing can influence how widely an article circulates, especially on platforms where user activity fluctuates across the week. At the same time, different content domains may naturally draw more attention depending on when readers are most active. To examine these dynamics, this plot visualizes average shares by day of the week for each article topic, allowing both temporal patterns and topical differences to be seen simultaneously. This helps clarify whether certain topics perform better on particular days and whether specific weekday–topic combinations are especially effective for reaching larger audiences.

The resulting line plot shows clear variation across topics and across days. Some topics, such as Lifestyle, maintain consistently high average shares throughout the week. These patterns may reflect increased leisure-time browsing or greater social media engagement during non-workdays. Other topics, such as Business, peak on weekends, aligning with the idea that users may have more time to focus on professional or news-driven content.

Do positive- or negative-leaning articles get more shares?

Sentiment polarity captures whether an article’s overall tone is positive or negative. Since emotional content can influence sharing behavior, examining how sentiment polarity aligns with share counts helps determine whether readers prefer uplifting narratives or are more drawn to negative content. Additionally, because headlines frame the article and influence reader expectations, the sentiment of the title may play a role in shares. Evaluating title sentiment polarity helps determine whether emotionally charged headlines correspond to increased sharing.

This shiny app (https://miartan.shinyapps.io/stat3280app2/) uses a heatmap, histograms, and table summaries to explore how shares relate to sentiment polarity in article titles and content. The heatmap shows that, overall, most article content and titles have a positive tone. There is subtle variation across topics, with some sections (such as Entertainment or World news) exhibiting a particularly high density of positive-tone articles. However, the heatmap does not convey whether these articles draw higher shares.

The histogram reveals the distribution of shares for articles in each cell of the heatmap. Articles with some of the highest shares appear to be positive-tone Lifestyle articles, while those with some of the lowest shares tend to be negative-tone Entertainment articles. However, in several cases, positive and negative articles within the same topic both achieve high shares. These results reinforce the idea that both emotional tone and subject coverage contribute to overall popularity.

How do subjectivity and polarity relate to shares?

While polarity measures how positive or negative an article is, subjectivity measures how opinionated or fact-based an article is. Highly subjective content may provoke stronger reactions, while objective content may convey authority. Exploring how subjectivity relates to shares provides insight into whether readers prefer sharing opinionated takes or factual reporting.

This shiny app (https://miartan.shinyapps.io/stat3280app3) plots global polarity against global subjectivity, revealing how articles vary in both emotional tone and degree of opinion. The first view is an animation that groups articles by low, medium, and high share counts, enabling a deeper view of how emotional intensity and subjectivity interact with article popularity. The second plot is static and allows users to view the breakdown of article shares within a selected section of points.

The static and animated views show that high-sharing articles tend to occupy a broad region of moderate polarity and subjectivity, rather than clustering at extremes. This suggests that highly polarizing or highly neutral articles are not the sole drivers of virality. Instead, popular articles often balance emotional tone with a degree of subjectivity, offering enough feeling to be engaging while maintaining enough objectivity to be credible. The animation reinforces this pattern by isolating each share group, making the separation clearer.

Brushed selections often reveal consistent patterns: clusters of moderately subjective and moderately polarized articles tend to exhibit higher mean and median shares, while clusters near the origin (neutral polarity, low subjectivity) frequently show modest engagement. This interaction suggests that articles with a moderate emotional and expressive profile may best balance broad appeal with reader interest. The brushed distribution reinforces the idea that engaging tone contributes to popularity, but extreme tone is not required for virality.

Does the incorporation of multimedia matter?

The presence of multimedia often influences how readers perceive and engage with online articles. Images can improve comprehension and increase click-through likelihood, especially on fast-moving platforms. Videos are also commonly used to increase engagement, capturing attention but also requiring more viewer commitment. Examining how images and videos correspond with sharing behavior can reveal whether these two forms of multimedia contribute to virality and whether their influence is synergistic.

The first set of boxplots compares articles containing at least one image to those containing none. Articles with images tend to show a wider spread of share values, including higher outliers. This suggests that images may help amplify the reach of articles that already perform well. However, the presence of images does not guarantee high engagement across the board, as low-share articles exist in both categories. Overall, the pattern implies that images may act as a supportive feature that boosts already compelling content rather than driving popularity on their own.

The second set of boxplots is the distribution of shares across articles with and without embedded videos. The presence of videos does not appear to drive article popularity, as video-less and video-rich articles show a similar spread and median share count. Overall, while images and video may appeal to users, their presence does not broadly enhance shares.

Do keywords help increase shares?

Keywords provide a useful lens for understanding what aspects of an article drive engagement. Each article in the dataset includes metadata describing its “best” and “worst” performing keywords, based on how strongly those keywords are associated with high or low share counts across the platform. By comparing average shares for best versus worst keywords across article topics, this visualization highlights how keyword performance varies by domain and whether certain topics benefit more from strong keyword associations. The size of each point also reflects the average number of keywords used in each topic, helping illustrate how keyword richness may contribute to popularity.

The plot reveals clear differences in how strongly topics respond to keyword quality. For every topic, the “best” keywords are associated with substantially higher average shares than the “worst” ones, confirming that keyword relevance matters across the board. However, the magnitude of this gap varies: topics like Business and Other show larger increases from worst to best keyword categories, suggesting that articles in these domains benefit more noticeably from strong keyword alignment. Lifestyle and Social Media show smaller gains, indicating that keyword performance may be less critical for these topics.

Point sizes also show that topics such as Lifestyle and Tech tend to have more keywords on average, but the articles themselves do not produce higher share averages overall. This pattern suggests that richer keyword coverage does not guarantee that the articles will surface to broader audiences. Instead, keyword quality amplifies article popularity more than number of keywords does, and this effect is especially pronounced in topics tied to markets and global events.

Conclusion

Across all analyses, the factors that most strongly relate to article shares are topic, keyword performance, and emotional tone. Certain topics – especially Lifestyle, World, Business, and Tech – consistently achieve higher engagement, suggesting that subject matter drives much of the variability in popularity. Articles that contain strong, high-impact keywords also show substantially higher share counts, particularly within news-oriented domains. Emotional characteristics such as sentiment polarity and subjectivity contribute meaningfully as well, but more as amplifiers than primary drivers: articles with moderate polarity and moderate subjectivity tend to perform best. In contrast, structural attributes such as title length, content length, and multimedia presence show weaker and more inconsistent relationships with shares. Overall, the results indicate that what an article is about and how well its themes align with audience interest matter far more than stylistic or structural features, emphasizing that topical relevance and keyword strength are the key predictors of article popularity.

References

Fernandes, K., Vinagre, P., & Cortez, P. (2015). A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News. UCI Machine Learning Repository: Online News Popularity Dataset.